A Recorded Debating Dataset

نویسندگان

  • Shachar Mirkin
  • Michal Jacovi
  • Tamar Lavee
  • Hong-Kwang Jeff Kuo
  • Samuel Thomas
  • Leslie Sager
  • Lili Kotlerman
  • Elad Venezian
  • Noam Slonim
چکیده

This paper describes an audio and textual dataset of debating speeches, a first-of-a-kind resource for the growing research field of computational argumentation and debating technologies. We detail the process of speech recording by professional debaters, the transcription of the speeches with an Automatic Speech Recognition (ASR) system, their consequent automatic processing to produce a text that is more “NLP-friendly”, and in parallel – the manual transcription of the speeches in order to produce gold-standard “reference” transcripts. We release speeches on various controversial topics, each in 5 formats corresponding to the different stages in the production of the data. The intention is to allow utilizing this resource for multiple research purposes, be it the addition of in-domain training data for a debate-specific ASR system, or applying argumentation mining on either noisy or clean debate transcripts. We intend to make further releases of this data in the future.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Preliminary Study of Disputation Behavior in Online Debating Forum

In this paper, we propose a task for quality evaluation of disputing argument. In order to understand the disputation behavior, we propose three sub-tasks, detecting disagreement hierarchy, refutation method and argumentation strategy respectively. We first manually labeled a real dataset collected from an online debating forum. The dataset includes 45 disputing argument pairs. The annotation s...

متن کامل

A Shared Task on Argumentation Mining in Newspaper Editorials

This paper proposes a shared task for the identification of the argumentative structure in newspaper editorials. By the term “argumentative structure” we refer to the sequence of argumentative units in the text along with the relations between them. The main contribution is a large-scale dataset with more than 200 annotated editorials, which shall help argumentation mining researchers to evalua...

متن کامل

'UK-DALE': A dataset recording UK Domestic Appliance-Level Electricity demand and whole-house demand

Many countries are rolling out smart electricity meters. These measure a home’s total power demand. However, research into consumer behaviour suggests that consumers are best able to improve their energy efficiency when provided with itemised, appliance-by-appliance consumption information. Energy disaggregation is a computational technique for estimating appliance-by-appliance energy consumpti...

متن کامل

Organizational Capital , R & D Assets , and Offshore Outsourcing

The degree of offshore outsourcing in the high-tech industries has increased rapidly in past decades. Because of this trend, economists have been debating whether offshore outsourcing is hollowing out U.S. high-tech firms’ core competencies in intangibles. To contribute to the debate, I first develop a forward-looking profit model and use Compustat dataset to measure the capital stock and depre...

متن کامل

The IMHG dataset: A Multi-View Hand Gesture RGB-D Dataset for Human-Robot Interaction

Hand gestures are one of the natural forms of communication in human-robot interaction scenarios. They can be used to delegate tasks from a human to a robot. To facilitate human-like interaction with robots, a major requirement for advancing in this direction is the availability of a hand gesture dataset for judging the performance of the proposed algorithms. We present details of the Innsbruck...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.06438  شماره 

صفحات  -

تاریخ انتشار 2017